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Abstract 


There  have  been  numerous  undocumented  reports  that  military  users  of 
night  vision  goggles  (NVGs)  tend  to  talk  louder  than  usual  when  they  wear 
the  viewing  device.  Increased  voice  level  in  response  to  using  the  night 
vision  aid  could  seriously  compromise  the  security  of  military  missions 
that  depend  upon  stealth  for  their  success.  The  goal  of  this  study  was  to 
investigate  the  effects  of  characteristics  of  the  NVGs  such  as  display 
resolution,  field  of  view,  and  physical  constraint  on  the  voice  level  of  NVG 
users  as  the  users  described  military  activity  of  potential  targets  seen 
during  a  visual  target  acquisition  task.  The  experiment  was  conducted 
indoors  without  the  presence  of  situational  variables  or  psychological 
stressors  ordinarily  found  in  the  field.  The  authors  wished  to  determine 
whether  voice  level  depended  on  the  physical  characteristics  of  the  NVGs. 
No  effect  of  physical  characteristics  of  the  NVGs  was  observed.  The 
influence  of  situational  variables  on  the  vocal  output  of  NVG  users  will  be 
examined  in  a  future  experiment.  Some  aspects  concerning  the  procedures 
used  to  measure  voice  levels  and  to  develop  a  realistic  visual  target 
acquisition  task  are  discussed. 


ACKNOWLEDGMENTS 


The  authors  express  their  utmost  appreciation  to  Mark  Kregel  of  K-Teck  Services 
who  created,  directed,  and  produced  the  videos  used  for  this  study  and  played  the  role  of  the 
moving  soldier  in  them.  His  sensitivity  to  the  needs  of  the  study,  attention  to  insightful  detail, 
and  his  resourcefulness  and  unlimited  energy  allowed  everything  to  happen  on  schedule.  The 
authors  also  thank  LaShawna  Wright  of  ARL  for  help  in  many  hours  of  data  reduction,  and  they 
thank  Ms.  Wright  and  Rachel  Jones  of  ARL  for  their  competent  assistance  in  collecting  data. 
The  authors  further  thank  Joseph  Mazurczak  and  Krishna  Pillalamarri,  of  ARL,  for  their  efforts 
in  preparing  the  theater  facility. 


iii 


CONTENTS 


INTRODUCTION .  3 

OBJECTIVES . 4 

METHOD .  4 

General .  4 

Participants .  5 

Visual  Stimuli .  5 

Stimulus  Selection .  6 

Viewing  Condition .  6 

Apparatus .  6 

Procedure .  7 

Experimental  Design .  8 

RESULTS  AND  DISCUSSION .  9 

The  Effect  of  NVGs .  9 

The  Effect  of  Perceived  Threat .  12 

GENERAL  DISCUSSION .  17 

REFERENCES . 19 

APPENDIX 

A.  Instructions .  21 

DISTRIBUTION  LIST .  27 

REPORT  DOCUMENTATION  PAGE .  35 

TABLES 

1.  Rms  Voice  Level  (dB A)  for  Scene  Content .  10 

2.  Rms  Voice  Level  (dB  A)  for  Identification  Number .  11 

3.  Rms  Peak  Voice  Level  Within  Scenes .  12 

4.  Ratings  by  10  Judges  of  Perceived  Threat  for  30  Scenes .  13 

5.  Rms  Voice  Level  (dBA)  for  Scene  Content  on  High  Threat  (T5)  and  Low 

Threat  (T2)  Scenes  Only .  15 

6.  Rms  Voice  Level  (dBA)  for  Identification  Numbers  on  High  Threat  (T5) 

and  Low  Threat  (T2)  Scenes  Only . 16 

7.  Rms  Peak  Voice  Levels  (dBA)  in  High  Threat  (T5)  and  Low  Threat  (T2) 

Scenes  Only .  17 


1 


THE  EFFECT  OF  WEARING  NIGHT  VISION  GOGGLES  ON  VOICE  LEVEL 
DURING  A  VISUAL  TARGET  ACQUISITION  TASK 


INTRODUCTION 

Night  vision  goggles  (NVGs)  have  been  of  inestimable  value  to  the  U.S.  armed  forces 
and  to  civilian  organizations  concerned  with  security.  For  a  major  application  of  NVGs  in  a 
military  context  such  as  nighttime  surveillance,  the  use  of  the  visual  aid  is  obviously  superior 
to  using  none  at  all.  Although  NVGs  possess  some  potential  visual  shortcomings  such  as 
limited  resolution  equivalent  to  having  a  visual  acuity  of  only  20/40  (as  measured  with  a 
Snellen  eye  chart)  and  a  limited  field  of  view  (FOV)  of  only  40°,  such  factors  probably  prove 
to  be  minor  limitations  when  compared  to  the  overall  usefulness  of  NVGs.  However,  there 
may  be  a  property  of  NVGs  that  is  potentially  critical  to  the  safety  of  the  user.  There  have 
been  numerous  undocumented  reports  that  users  of  the  device  have  a  tendency  to  talk 
noticeably  louder  while  wearing  NVGs  than  when  they  are  not  wearing  them,  even  though  the 
users  are  fully  aware  of  the  effect.  Such  accounts  have  been  given  by  (among  others)  members 
of  the  Army’s  special  forces.  Some  of  their  personnel  claim  that  they  are  aware  of  a  voice 
effect,  and  during  stealth  conditions,  they  try  not  to  use  NVGs  if  possible.  The  effect 
potentially  compromises  a  military  mission  that  depends  upon  stealth  for  its  success.  If  it 
can  be  substantiated,  the  phenomenon  poses  some  fundamental  questions  about  why  the  voice 
level  of  a  user  might  be  affected  by  NVGs  even  though  access  to  the  ears  of  the  user  appears 
to  be  unobstructed  by  the  device. 

Many  hypotheses  can  be  devised  to  explain  or  speculate  about  the  phenomenon  of 
talking  loudly  when  NVGs  are  worn.  One  is  that  the  user’s  task  of  interpreting  the  world, 
as  seen  through  a  small,  bright  display  with  a  limited  FOV,  while  the  user  is  immersed  in  the 
surrounding  darkness,  may  produce  a  cognitive  or  attentional  tunneling  effect  (Wickens, 
Thomas,  Merlo,  &  Hah,  1999;  Yeh  &  Wickens,  1999)  that  leads  to  a  deficit  in  peripheral 
sensory  awareness.  This  effect  might  give  the  user  a  sense  of  isolation  that  must  be  overcome. 
Overcoming  that  isolation  while  talking  with  another  person  at  an  unknown  location  in  the 
surrounding  darkness  perhaps  translates  into  an  increase  in  voice  level.  Another  hypothesis 
concerning  the  phenomenon  involves  the  stress  level  of  the  NVG  user  (National  Research 
Council,  1997).  Such  a  device  would  normally  be  required  on  a  night  mission  during  periods  of 
high  risk.  It  is  easy  to  imagine  that  the  mix  of  potential  danger  with  the  need  for  stealth  and 
the  responsibility  for  the  safety  of  others  might  exaggerate  the  sound  of  any  voice  above  a 
whisper.  Other  hypotheses  about  the  phenomenon  involve  notions  that  voices  may  sound 
louder  in  the  dark  because  there  are  fewer  visual  distractions,  or  perhaps  the  urgency  of 
warning  others  of  potential  danger  may  increase  the  voice  level.  The  feel  of  the  NVGs’  being 
worn  on  the  face  may  produce  a  need  to  overcome  the  obstacle  by  talking  louder.  Perhaps  the 
use  of  a  head-mounted  display  with  a  limited  FOV  and  the  need  to  keep  a  continuing  visual 
event  in  constant  sight  exaggerate  the  necessity  of  not  turning  away  from  the  event  to  talk 
with  another  person  for  fear  of  losing  visual  orientation  in  the  display.  The  need  to  retain  the 
head  position  may  cause  the  voice  level  to  rise  during  the  relayed  reporting  of  the  visual  event. 
Finally,  in  a  nighttime,  military  field  situation,  because  of  the  necessity  for  stealth,  talking  is 
infrequent.  Occasionally,  urgent  commands  such  as  “stop,”  “wait,”  or  “get  down”  may 
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simply  sound  louder  than  normal  in  the  quiet  of  the  night.  Surprisingly,  these  issues  did  not 
trigger  much  research  interest  in  the  past  and  no  reliable  data  are  available. 

Identifying  the  cause  of  the  phenomenon,  if  it  exists  at  all,  of  increased  voice  level  when 
NVGs  are  worn  is  important  for  both  the  user  and  the  NVG  designer  in  order  to  minimize  or 
eliminate  the  effect  for  the  safety  and  survival  of  the  user.  Such  means  may  include  side-tone 
amplification  (Chang- Yit,  Pick,  &  Siegel,  1975),  NVG  redesign,  or  specialized  training  and  may 
differ  according  to  both  the  cost  and  the  time  needed  for  implementing  modifications. 


OBJECTIVES 

The  primary  objective  of  this  study  was  to  determine  if  the  phenomenon  of  increased 
voice  level  associated  with  NVG  use  could  be  reproduced  in  the  laboratory  during  more 
controlled  conditions  than  would  be  available  in  the  field.  If  so,  this  study  of  the  phenomenon 
would  not  only  be  greatly  simplified  but  might  also  provide  specific  information  about 
whether  it  had  any  basis  in  the  purely  physical  aspects  of  using  NVG  displays.  In  other 
words,  the  authors  wished  to  determine  if  such  elements  as  limited  FOV  and  resolution,  the 
weight  of  the  display  or  the  restriction  of  its  supporting  head  harness  played  significant  roles 
in  the  phenomenon.  Secondary  objectives  of  the  study  were  to 

1 .  Construct  a  realistic,  visual  target  acquisition  task  that  involved  moving  targets  at 
close  range  (on  average,  20  m)  in  order  to  simulate  the  domain  within  which  the  phenomenon 
was  reported,  and 

2.  Develop  a  method  for  measuring  speech  levels  that  might  occur  during  such  a  task. 

These  objectives,  if  successfully  achieved,  could  have  further  applications  in  studying  the 
behavior  of  soldiers  in  their  environment. 


METHOD 

General 

The  critical  elements  of  viewing  condition  and  scene  characteristics  that  might 
contribute  to  the  phenomenon  of  talking  loud  are  unknown.  Therefore,  our  approach  was  to 
investigate  the  effects  of  NVG  characteristics  such  as  display  resolution,  limited  FOV,  and 
the  physical  constraint  of  the  NV G  harness  itself.  Incidentally,  the  weight  of  the  device 
(close  to  700  grams),  extends  outward  from  the  head  of  the  observer  and  creates  a  load  vector 
that  must  be  supported  by  a  substantial,  tightly  worn  head  harness.  For  scene 
characteristics,  the  concern  was  to  employ  as  many  dynamic  visual  elements  in  the  scene 
content  as  possible,  which  might  exist  in  a  natural  outdoor  environment. 

A  dynamic,  target  acquisition  task  incorporating  elements  of  target  motion  and 
uncertainty  of  target  event  occurrence  was  chosen  for  use  to  keep  within  the  context  of  what  a 
realistic  night  vision  scene  might  presumably  encompass.  It  was  deemed  necessary  to  include 
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task  elements  requiring  visual  observation,  cognitive  processing,  and,  of  course,  voice 
communication.  It  was  also  necessary  to  present  participants  with  a  task  that  elicited  sufficient 
involvement  to  immerse  them  in  the  context  of  playing  the  role  of  an  actual  observer  in  the  field 
who  was  required  to  communicate  a  running  description  of  a  changing  scene  to  another  person. 


Participants 

Twenty  male  participants  (herein  called  “observers”)  between  1 8  and  53  years  of  age, 
with  a  mean  of  34  years,  from  various  National  Guard  units,  volunteered  and  were  paid  for 
their  participation  in  this  experiment.  All  observers  had  minimally  20/30  visual  acuity 
(corrected  or  uncorrected)  in  both  eyes  and  normal  stereoscopic  vision.  A  Titmus®  II  vision 
tester  was  used  to  screen  the  observers.  In  addition,  all  observers  had  pure  tone  hearing 
thresholds  within  normal  limits  (i.e.,  better  than  25  dB  HL1  for  octave  frequencies  from  250 
through  8000  Hz  as  per  American  National  Standards  Institute  [ANSI]  S3. 6-1996).  Hearing 
testing  was  conducted  in  a  room  with  background  noise  levels  complying  with  clinical 
requirements  for  earphone  testing  (ANSI,  1991). 


Visual  Stimuli 

The  stimulus  set  consisted  of  front-projected  images  of  color  video  scenes  of  wooded 
areas  showing  “live  footage”  of  a  militarily  relevant  activity.  The  average  scene  duration  was 
about  20  seconds,  with  a  standard  deviation  of  3  seconds.  A  four-digit  identification  number 
was  first  shown  in  the  center  of  the  viewing  area,  about  5  seconds  before  the  onset  of  each 
scene,  and  lasted  for  about  4  seconds.  This  number  served  as  (a)  an  initial  visual  fixation  point 
for  the  observer,  (b)  a  label  for  locating  the  correct  scene  on  the  observer’s  voice  recording  of 
the  stimuli,  and  (c)  a  neutral  stimulus  as  a  verbal  reference  for  evaluating  voice  levels  associated 
with  different  viewing  conditions.  Because  of  the  large  number  and  variety  of  scene  characteris¬ 
tics  that  the  authors  wished  to  include  as  elements  in  the  stimuli,  a  great  number  of  reasonably 
realistic  scenarios  were  recorded  in  order  to  select  from  those  scenes  the  ones  that  best  fit  the 
requirements.  About  55  scenes  were  videotaped. 

A  typical  video  scene  contained  various  combinations  of  a  moving  soldier  and 
silhouettes  of  stationary  soldiers.  Static  and  dynamic  scene  variables  involved  the  presence 
or  absence  of  a  moving  target,  the  number  of  stationary  targets  (none,  one,  or  two),  and  the 
apparent  distances  of  the  targets  (from  5  to  50  meters)  to  the  observer  viewing  the  scene. 

Other  scene  characteristics  were  the  denseness  of  the  foliage  (low,  medium,  or  high)  in  the 
vicinity  of  the  targets  and  the  proportion  of  the  moving  target  (low,  medium,  or  high) 
obscured  by  the  foliage.  More  dynamic  elements  involved  the  amount  of  movement  (low, 
medium,  or  high)  of  the  foliage  caused  by  wind,  and  the  amount  (fast  or  slow  movement)  and 
type  of  movement  of  the  target  (e.g.,  either  approaching  the  observer  or  moving  across  the 
scene  relative  to  the  observer’s  line  of  sight).  Important  target  characteristics  concerned  the 
physical  behavior  of  the  moving  soldier  in  the  scene  and  whether  a  weapon  was  present  and 


1  Hearing  level  -  a  logarithmic  measure  of  hearing  loss  in  reference  to  a  standardized  threshold  level  (ANSI 
1994). 
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being  held  in  a  neutral  position  or  being  aimed.  The  soldier  in  the  scene  could  aim  the  weapon 
in  any  direction  and  could  appear  to  aim  at  the  observer. 


Stimulus  Selection 

After  the  55  scenes  were  videotaped,  five  people  acting  as  judges  viewed  each  scene 
without  visual  aids  and  rated  it  for  the  relative  difficulty  of  detecting  the  moving  target.  Thirty 
scenes  of  the  55  were  selected  for  use  as  final  stimuli  on  the  basis  of  creating  a  distribution  of 
scene  difficulty  ratings  from  moderately  difficult  to  very  difficult.  Those  scenes  in  which  the 
target  was  highly  visible  (i.e.,  it  presented  no  search  challenge  to  the  observer)  were  not  chosen. 
Scenes  were  selected  according  to  (a)  how  much  time  it  took  to  discover  the  moving  target  and 
the  silhouettes  and  (b)  how  broad  the  representation  was  of  the  previously  mentioned  scene 
characteristics.  Some  scenes  with  no  target  were  also  chosen.  The  final  group  of  30  scenes  was 
then  divided  into  five  groups  of  six  scenes  each  so  that  there  was  a  reasonable  balance  of  the 
relevant  stimulus  elements  in  each  group.  Five  video  cassettes  of  the  scene  groups  were 
prepared  to  facilitate  randomization,  and  these  were  shown  to  the  observers  during  each  of  the 
different  viewing  conditions. 


Viewing  Condition 

Each  of  the  observers  viewed  the  videos  during  five  different  conditions: 

1.  NVGs  with  reduced  resolution  equivalent  to  a  visual  acuity  of  20/70  (Snellen); 

2.  NVGs  with  normal  resolution  equivalent  to  20/40  (Snellen); 

3.  Mock  goggles  possessing  the  physical  characteristics  of  weight  (approximately 
700  grams)  and  FOV  (40°)  similar  to  the  NVGs  but  without  optics; 

4.  The  NVG  harness  alone  with  no  goggles  attached;  and 

5.  No  viewing  apparatus  worn  on  the  head. 


Apparatus 

A  theater-type  environment  was  used  to  present  the  stimuli.  It  consisted  of  a  large 
(v  =  1500  m3;  17  m  long  by  13  m  wide  by  7  m  high),  acoustically  treated  room  with  rever¬ 
beration  time  (RT)  <  0.5  second,  in  the  frequency  range  from  125  to  4000  Hz.  The  center 
region  of  this  space  was  an  acceptable  approximation  of  outdoor  listening  conditions  for  low- 
and  mid-level  acoustic  stimulation.  A  low-level  (40  to  50  dBA),  recording  of  natural  outdoor 
woodland  sounds  was  used  for  the  background  noise  to  boost  the  realism  of  the  simulation 
and  to  mask  any  distracting  noises. 
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The  instrumentation  used  to  present  the  stimuli  included  a  video  tape  deck,  a  video 
projector,  and  a  20-ft  by  13-ft  projection  screen  at  a  viewing  distance  of  20  feet  from  the 
observer.  The  video  projection  equipment  was  located  behind  the  observer  in  a  projection 
room  acoustically  isolated  from  the  theater  environment.  A  secure  viewing  platform,  4  feet 
high,  was  built  on  the  theater  floor  to  allow  the  observer’s  eye  level  to  be  even  with  the  eye 
level  of  the  soldiers  depicted  in  the  video  scenes,  at  about  the  center  of  the  screen  height.  The 
soldier  in  the  scenes  nearest  the  camera  that  recorded  his  activity  was  (on  average)  photographed 
at  a  similar  distance  from  the  camera  as  the  observer-to-screen  viewing  distance.  The  visual 
angle  subtended  by  a  target  on  the  screen  was  similar  to  the  visual  angle  subtended  in  the  actual 
environment. 

The  instrumentation  used  for  the  viewing  conditions  consisted  of  bi ocular  AN/PVS-7B 
NVGs  with  head  harness.  The  goggles  used  an  integral  battery  compartment  that  contained 
two  standard  AA  batteries.  The  total  head-borne  weight  of  the  device  was  695  grams.  In  order 
for  the  NVGs  to  be  used  to  view  projected  videos,  a  lens  cap  with  a  pinhole  was  used  on  the 
object  lens.  Normal  resolution  of  the  goggles,  with  or  without  a  pinhole,  was  equivalent  to  the 
wearer  having  a  visual  acuity  of  20/40,  Snellen.  In  the  20/70  Snellen  condition,  adding  diffusing 
material  over  the  pinhole  reduced  the  viewing  acuity.  A  lens  cap  with  either  a  clear  or  with  a 
diffusing  pinhole  was  used  for  the  two  NVG  viewing  conditions.  In-house  fabricated  mock 
goggles  that  simply  limited  the  FOV  to  40°  without  optics  and  that  had  the  same  weight  as  the 
NVGs  were  used  as  a  third  viewing  condition.  A  fourth  viewing  condition  consisted  of  the 
NVG  harness  by  itself.  NVGs  were  first  harnessed  onto  the  head  of  the  observer  to  firmly 
secure  them;  then,  the  goggles  were  removed,  leaving  only  the  harness  in  place. 

The  instrumentation  used  to  record  the  observer’s  verbal  reports  consisted  of  a 
microphone  and  digital  audio  tape  recorder  and  an  audio  calibration  device.  The  instrumen¬ 
tation  used  for  storing,  processing,  and  analyzing  the  voice  data  consisted  of  an  IBM  586 
computer  and  monitor,  signal-editing  software,  and  software  for  measuring  sound  quality. 


Procedure 

Each  observer  was  shown  short  videos,  each  approximately  20  seconds  in  duration, 
of  wooded  scenes.  The  observer’s  task  was  to  continuously  search  for  and  verbally  report 
the  occurrence  (as  it  was  happening)  of  any  activity  of  military  significance  during  the  scene 
video.  A  typical  military  activity  might  consist  of  a  soldier  appearing,  holding  a  weapon,  and 
moving  in  some  direction.  The  soldier  in  the  scene  sometimes  appeared  to  aim  the  weapon 
toward  the  observer.  In  some  scenes,  silhouettes  of  soldiers  appeared  as  well. 

Each  observer  was  asked  to  imagine  that  he  was  on  a  military  night  mission,  leading 
a  small  squad  of  soldiers  through  a  wooded  terrain.  He  was  the  only  person  in  the  squad 
wearing  NVGs  and  had  to  be  the  “eyes  of  the  squad,”  required  to  report  anything  of 
importance  so  that  danger  could  be  successfully  avoided.  No  reference  to  “stealth”  was 
mentioned.  The  experimenter  played  the  role  of  a  squad  member  immediately  behind  the 
observer.  The  four-digit  identification  number  that  preceded  each  scene  was  first  reported. 
When  the  scene  began,  the  observer  reported  any  important  terrain  details  such  as  hills. 
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gullies,  or  undergrowth  that  might  have  to  be  avoided,  or  the  presence  of  any  paths  through 
the  woods.  Upon  sighting  a  target,  the  observer  was  instructed  to  immediately  report  what 
the  target  was,  such  as  a  moving  soldier.  This  was  to  be  followed  by  a  report  of  the  apparent 
location  of  the  target  according  to  the  clock  values  of  from  9  o’clock,  (the  left  side  of  the 
view),  to  12  o’clock,  (straight  ahead),  to  3  o’clock,  (the  right  side  of  the  view).  After 
reporting  the  location,  the  observer  was  to  estimate  and  report  the  apparent  distance  of  the 
target  from  the  viewing  position.  This  information  was  to  be  followed  by  a  description  of 
what  the  target  was  doing.  The  observer  was  to  report  the  soldier’s  direction  of  movement, 
if  any,  in  which  direction  a  rifle  was  aiming,  if  the  soldier  appeared  to  be  holding  a  rifle,  and 
especially,  if  the  soldier  in  the  scene  appeared  to  see  the  observer. 

The  observers  were  shown  the  videos  in  all  five  viewing  conditions.  For  all  conditions, 
they  wore  a  soldier’s  field  cap  with  a  microphone  attached  to  its  brim  and  connected  to  a  small 
tape  recorder  in  a  pouch  worn  over  the  shoulder.  The  microphone  was  always  at  the  same 
distance  from  the  observer’s  mouth,  with  no  restriction  of  head  movement.  Observers  were 
told  that  their  verbal  descriptions  were  being  recorded  for  future  analysis  of  the  kinds  of 
descriptions  soldiers  used  in  such  a  task.  At  no  time  were  they  told  that  the  authors  were 
interested  in  how  loudly  they  were  talking  nor  was  a  reference  ever  made  to  any  scenario 
requiring  stealth. 

During  the  practice  session  in  which  three  scenes  were  used  to  instruct  the  observer 
about  the  kind  of  information  to  report,  no  goggles  were  worn.  The  configuration  of  the 
viewing  area  required  the  experimenter  to  be  situated  behind  a  partition  8  feet  to  the  rear  and 
side  of  the  viewing  platform  upon  which  the  observer  stood.  The  observer’s  reports  were 
digitally  recorded  during  all  sessions  in  order  to  document  their  verbal  content,  temporal 
characteristics,  and  voice  level.  Observers  were  told  that  the  experimenter,  playing  the  role  of 
a  squad  member,  was  similarly  unable  to  see  each  scene  and  would  be  situated  behind  a 
partition,  making  notes  about  what  kind  of  information  was  being  verbally  communicated. 

The  stimulus  materials  were  front  projected  onto  the  13-  by  20-foot  viewing  screen  in 
the  darkened  room.  The  observer  stood  on  the  viewing  platform  at  approximately  eye  level 
with  the  center  of  the  screen  at  a  distance  of  20  feet  from  the  screen  and  described  to  the 
experimenter  what  was  seen.  This  geometry  produced  viewing  angles  with  the  screen  of 
approximately  36°  by  53°.  The  viewing  angle  of  the  NVGs  was  approximately  40°,  thus 
requiring  the  head  to  move  horizontally  to  see  the  whole  scene.  For  each  viewing  condition, 
the  observer  donned  the  appropriate  viewing  apparatus  and  observed  six  video  scenes. 


Experimental  Design 

A  within-subjects  design  was  used  for  the  independent  variable  of  viewing  condition 
with  five  levels.  The  viewing  conditions  were  presented  in  a  counterbalanced  design,  based 
on  two  5  by  5  Greco-Latin  squares  so  that  each  viewing  condition  followed  every  other 
condition  exactly  two  times.  The  five  video  cassettes  were  paired  with  the  viewing  condi¬ 
tions  in  a  similar  design,  also  based  upon  two  superimposed  5  by  5  Greco-Latin  squares. 


Each  video  cassette  presentation  both  followed  and  preceded  equally  every  other  video 
cassette  and  was  shown  an  equal  number  of  times,  with  each  of  the  viewing  conditions  for 
a  total  of  20  observers. 

The  dependent  measure  was  the  voice  level  of  the  observer.  A  calibration  signal 
(recorded  before  the  observer  entered  the  theater)  was  followed  by  the  voice  of  the  observer 
reading  the  scene  identification  numbers  and  the  verbal  descriptions  of  the  scene  contents, 
which  were  tape  recorded  and  later  stored  in  computer  data  files  for  analysis.  The  signal¬ 
editing  software  was  used  to  display  individual  voice  files  on  the  computer  screen  and  to 
measure  appropriate  sound  levels  during  the  data  analysis.  For  the  verbal  description  of  the 
scene  content,  all  pauses  in  the  sound  record  were  first  removed  to  reduce  any  effect  of 
silence  on  the  measurement  of  the  mean  voice  level  for  a  scene.  A  pause  was  defined  as  the 
cessation  of  speech  between  phrases  or  while  the  observer  was  waiting  for  some  activity  to 
occur  in  the  scene.  After  the  pauses  were  edited,  the  final  sound  record  of  an  observer 
consisted  of  all  the  words  and  phrases  appearing  at  approximately  equal  time  intervals.  No 
sound  editing  was  performed  on  the  verbalizations  of  the  scene  identification  numbers. 
Sound  levels  were  measured  as  root  mean  square  (rms)  energy  levels  in  dBA  units2. 


RESULTS  AND  DISCUSSION 

In  order  to  compare  voice  levels,  average  loudness  level  and  average  speech  sound 
pressure  (rms)  level  were  to  be  used  initially  as  the  dependent  variables.  A  sound  quality 
software  program  was  used  to  calculate  loudness  data.  However,  the  differences  between 
loudness  levels  were  very  similar  to  the  differences  between  rms  levels  as  measured  by  the 
signal-editing  software.  This  was  because  all  the  voices  had  very  similar  spectral  content. 
Therefore,  all  calculations  were  completed  and  are  reported  using  rms  data  that  were  easier 
and  faster  to  obtain. 


The  Effect  of  NVGs 

The  average  rms  levels  of  observers’  voices  are  reported  in  Tables  1  and  2.  The  data 
in  Table  1  were  obtained  for  verbalized  scene  content,  while  the  data  in  Table  2  were  obtained 
for  verbalized  pre-scene  identification  numbers.  The  analysis  of  rms  voice  levels  pertaining 
to  the  content  of  each  scene  revealed  no  significant  differences  between  any  of  the  viewing 
conditions.  Averaging  the  data  across  all  observers,  the  voice  levels  for  viewing  condition 
ranged  from  means  of  75.6  to  76.5  dBA,  with  standard  deviations  from  8  to  9  dB.  The  fact 
that  observed  speech  levels  were  higher  than  the  typical  levels  for  conversational  speech 
(normally  60  to  65  dBA)  probably  indicated  that  the  observers  were  attempting  to 
communicate  with  the  unseen  person  (the  experimenter)  located  somewhere  behind  a 
partition  to  the  rear  of  the  observer’s  viewing  position. 


2dBA  =  weighted  measure  of  sound  pressure  level  using  A-weighted  sound  pressure  (ANSI  1994). 
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Table  1 


Rms  Voice  Level  (dBA)  for  Scene  Content 


Subjects 

70NVG 

40  NVG 

Viewing  condition 

MOCK  HARN 

NONE 

1 

81.8 

80.7 

80.7 

76.7 

79.0 

2 

82.6 

83.6 

83.9 

84.8 

82.4 

3 

83.0 

81.0 

83.9 

82.8 

79.5 

4 

79.0 

80.5 

80.1 

81.5 

78.6 

5 

85.4 

81.3 

82.1 

82.9 

81.2 

6 

70.9 

70.1 

71.7 

72.7 

72.9 

7 

66.9 

69.1 

71.2 

71.7 

72.1 

8 

76.5 

80.1 

77.6 

80.8 

72.8 

9 

75.2 

74.1 

78.8 

80.4 

76.3 

10 

80.3 

75.8 

77.0 

80.3 

80.0 

11 

78.8 

79.8 

81.5 

80.1 

80.2 

12 

77.5 

79.3 

77.8 

77.9 

77.3 

13 

62.8 

64.5 

62.0 

63.6 

66.1 

14 

57.9 

58.1 

60.2 

64.3 

60.2 

15 

81.7 

83.3 

80.2 

80.7 

85.3 

16 

79.2 

78.5 

76.2 

76.5 

78.0 

17 

75.0 

77.0 

77.2 

75.9 

74.2 

18 

84.4 

84.1 

84.5 

84.1 

84.2 

19 

81.9 

81.8 

84.1 

79.4 

80.4 

20 

52.8 

52.1 

53.4 

53.2 

51.8 

SUM 

1513.6 

1514.8 

1524.1 

1530.3 

1512.5 

MEAN 

75.7 

75.7 

76.2 

76.5 

75.6 

SD 

9.0 

8.8 

8.6 

8.0 

8.2 

Likewise,  an  analysis  of  the  rms  voice  levels  of  the  pre-scene  identification  numbers 
revealed  no  differences  between  viewing  conditions.  The  average  voice  levels  ranged  from  76.1 
to  76.6  dBA,  with  standard  deviations  from  6.7  to  7.6  dB.  Of  specific  interest  is  that  the  voice 
levels  for  the  identification  numbers  (an  assumed  neutral  source  of  stimuli)  and  scene  content 
(an  assumed  highly  variable  source  of  stimuli)  differed  from  each  other  by  only  about  0.5  dB, 
on  average.  In  other  words,  their  rms  levels  were  almost  identical.  This  result  was  surprising  in 
view  of  the  fact  that  during  data  collection,  when  observers  were  describing  some  scenes,  they 
appeared  to  be  responding  quite  differently  to  provocative  actions  of  the  moving  soldier  in  the 
scene.  For  example,  if  the  moving  soldier  began  aiming  his  weapon  at  the  observer,  the 
response  was  to  speak  more  excitedly  in  contrast  to  a  more  monotonic  verbalization  of  other 
portions  of  the  scene.  Furthermore,  this  effect  appeared  to  be  unrelated  to  whether  the 


10 


observer  was  wearing  goggles.  This  observation  called  for  an  additional  data  analysis  of  peak 
rms  voice  levels  in  scene  reports  rather  than  average  rms  voice  levels  which  could  diminish  the 
effect  of  peaks.  There  was  a  possibility  that  a  difference  between  viewing  conditions  might  be 
observed  in  speech  levels  of  single  isolated  words  or  phrases  uttered  in  response  to  a 
“perceived  threat.”  An  analysis  of  short-term  response  events  occurring  within  each  scene  was 
performed  on  all  scene  data.  A  short-term  response  event  was  defined  as  the  rms  level  of  a 
verbalized  phrase  of  2-second  duration  that  included  the  maximum  peak  vocalization  of  the 
scene.  These  data  are  shown  in  Table  3.  An  analysis  of  the  rms  peak  levels  similarly  revealed 
no  differences  between  viewing  conditions.  The  peak  levels  ranged  from  76.6  to  77.5  dBA, 
with  standard  deviations  from  8  to  9  dB. 


Table  2 


Rms  Voice  Level  (dBA)  for  Identification  Number 


Subjects 

70NVG 

40  NVG 

Viewing  condition 
MOCK 

HARN 

NONE 

1 

80.8 

80.0 

78.2 

75.6 

77.4 

2 

79.1 

80.6 

81.5 

80.9 

78.8 

3 

85.1 

83.3 

85.3 

83.9 

81.9 

4 

79.1 

80.8 

80.0 

82.4 

79.7 

5 

85.9 

81.0 

82.3 

81.9 

80.9 

6 

71.6 

73.1 

73.3 

74.7 

74.6 

7 

69.1 

73.0 

75.1 

72.7 

74.7 

8 

77.4 

77.8 

74.7 

77.8 

71.0 

9 

78.4 

79.0 

82.0 

82.4 

78.8 

10 

80.1 

77.5 

77.3 

80.4 

81.1 

11 

78.6 

78.8 

76.6 

79.7 

80.5 

12 

77.9 

79.0 

78.0 

78.0 

76.3 

13 

62.2 

62.7 

61.8 

61.5 

66.8 

14 

61.3 

59.7 

62.2 

66.1 

62.1 

15 

82.0 

82.9 

81.3 

80.1 

84.2 

16 

80.0 

79.2 

77.2 

77.6 

78.7 

17 

75.3 

76.2 

76.5 

74.6 

73.1 

18 

85.6 

83.4 

84.9 

84.2 

84.4 

19 

79.6 

78.5 

80.4 

76.3 

77.1 

20 

61.1 

60.1 

61.7 

61.8 

60.3 

SUM 

1530.2 

1526.6 

1530.3 

1532.6 

1522.4 

MEAN 

76.5 

76.3 

76.5 

76.6 

76.1 

SD 

7.6 

7.3 

7.1 

6.7 

6.7 

li 


An  additional  question  raised  was  whether  either  of  the  measures,  the  average  rms  voice 
level  or  the  peak  rms  voice  level,  were  a  suitable  measure  for  discriminating  between  observed 
levels  of  a  “perceived  threat.”  To  help  answer  this  question,  it  was  decided  to  closely  examine 
a  certain  portion  of  scenes  in  which  (during  data  collection)  observers  appeared  to  respond  to  a 
perceived  threat  with  increased  voice  level.  The  authors  wished  to  see  if  the  rms  voice  level 
measures  would  discriminate  between  verbal  responses  to  neutral  activity  and  verbal  responses 
to  more  provocative  actions  of  the  soldier  in  the  scene.  Since  such  a  thematic  factor  affecting 
voice  level  was  not  initially  considered  as  a  component  in  the  original  scene  selection  criteria,  all 
30  scenes  used  in  the  experiment  were  re-evaluated  for  the  presence  of  “perceived  threat”  after 
all  the  data  were  collected. 


Table  3 


Rms  Peak  Voice  Level  (dBA)  Within  Scenes 


Subjects 

70  NVG 

40  NVG 

Viewing  condition 
MOCK 

HARN 

NONE 

1 

83.6 

82.5 

83.0 

77.5 

80.6 

2 

82.2 

85.2 

85.0 

84.1 

82.2 

3 

83.6 

81.6 

84.7 

84.6 

81.6 

4 

80.4 

81.4 

81.7 

83.2 

80.3 

5 

87.0 

82.9 

83.1 

83.7 

82.3 

6 

71.0 

71.6 

72.5 

72.2 

73.9 

7 

68.3 

69.4 

72.4 

72.2 

73.2 

8 

78.6 

82.5 

78.3 

83.9 

75.4 

9 

76.5 

74.9 

80.7 

81.7 

77.7 

10 

82.1 

78.3 

78.0 

82.7 

81.9 

11 

80.4 

80.3 

83.4 

82.2 

82.3 

12 

78.8 

81.6 

78.8 

79.5 

77.7 

13 

65.4 

66.1 

63.5 

65.4 

68.4 

14 

58.6 

57.6 

62.1 

65.7 

60.5 

15 

82.8 

83.6 

79.9 

80.7 

85.0 

16 

78.6 

78.7 

76.4 

76.1 

78.5 

17 

74.5 

77.1 

77.4 

75.5 

73.7 

18 

84.3 

83.7 

83.9 

84.1 

83.4 

19 

82.3 

82.1 

83.9 

79.7 

81.3 

20 

54.0 

53.9 

54.5 

54.3 

52.7 

SUM 

1533.0 

1535.0 

1543.2 

1549.0 

1532.6 

MEAN 

76.7 

76.8 

77.2 

77.5 

76.6 

SD 

8.9 

8.8 

8.4 

8.0 

8.1 
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The  Effect  of  Perceived  Threat 


Ten  people  who  were  not  participants  in  the  experiment  volunteered  to  serve  as 
judges.  Without  using  goggles,  they  were  shown  the  30  scenes  in  the  same  instructional 
context  as  the  observers  in  the  experiment  had  been  given,  that  is,  to  imagine  they  were 
covertly  observing  a  wooded  area  at  night  with  the  aid  of  a  night  vision  device.  Judges 
independently  rated  each  scene  from  1  to  5  on  the  basis  of  how  threatening  or  intimidating 
the  target  activity  in  the  scene  appeared  to  be  to  them.  A  rating  of  1  was  no  threat,  2  was 
low  threat,  3  was  moderate,  4  was  high,  and  5  was  very  high  threat.  Judges  reported 
afterward  that  they  primarily  rated  a  scene  based  on  whether  the  soldier  in  it  appeared  to 
actually  point  a  rifle  at  them.  Other  elements  such  as  the  nearness  of  the  soldier,  the  speed 
and  direction  of  his  movement,  and  how  visible  he  was  were  also  judged  to  be  important  to 
their  ratings.  Table  4  shows  the  threat  ratings  of  the  ten  judges. 

It  can  be  seen  that,  of  the  30  scenes  rated,  19  scenes  achieved  agreement  on  their 
ratings  by  a  majority  of  6  to  10  of  the  raters.  Initial  efforts  at  balancing  the  distribution  of 
dynamic  scene  characteristics  within  each  video  cassette  resulted  in  finding  that  each  one 
contained  at  least  one  scene  at  the  rating  of  5,  “very  high  threat”  level,  and  all  but  one  cassette 
contained  at  least  one  scene  at  2,  “low  threat”  level.  In  the  one  cassette,  the  scene  chosen  for 
use  as  low  threat,  although  not  agreed  upon  by  a  clear  majority  of  the  raters,  attained  the 
lowest  average  threat  rating  of  1.6.  The  voice  data  from  these  10  scenes  were  analyzed  in  a 
similar  manner  as  before  for  the  effect  of  perceived  threat  on  voice  level.  The  nine  remaining 
scenes  of  high  agreement  among  the  judges,  plus  the  1 1  scenes  with  ratings  of  insufficient 
agreement  (fewer  than  six  judges  agreeing)  were  not  used  for  the  analysis. 

Average  voice  levels  (in  dBA)  for  the  five  scenes  with  the  highest  (T5)  and  the  five 
scenes  with  the  lowest  (T2)  “perceived  threat”  are  shown  in  Tables  5  and  6.  A  small  but 
statistically  significant  difference  was  found  between  high  and  low  perceived  threat  scenes, 
based  on  rms  averages  of  the  entire  scene  content.  High  threat  scenes  were  approximately  0.9 
dB  louder,  on  average,  than  low  threat  scenes,  F(l,  19)=  12.395,/?  =  0.002.  However,  similar 
differences  were  found  for  the  pre-scene  identification  numbers.  Identification  numbers 
associated  with  high  threat  scenes  were  0.9  dB  louder,  on  average,  than  identification  numbers 
associated  with  low  threat  scenes,  F(l,  19)  =  32.908 ,p  <  0.001.  Since  the  identification 
number  always  preceded  the  scene,  it  was  impossible  for  it  to  reflect  any  differences  in  scene 
threat  ratings.  Therefore,  the  observed  differences  were  probably  spurious  and  cannot  be 
considered  as  resulting  from  differences  in  the  perceived  threat. 

An  analysis  of  peak  rms  voice  levels  for  the  scenes  with  the  highest  and  the  lowest 
“perceived  threat”  was  performed  as  well.  Peak  data  are  presented  in  Table  7.  No  significant 
differences  were  found  between  threat  levels  or  between  viewing  conditions.  Although  the 
means  were  similar  to  those  in  Table  5  for  average  rms,  apparently  the  variances  were  high 
enough  to  negate  any  differences  between  them. 

Based  on  these  data,  it  may  be  concluded  that  the  effect  of  perceived  threat  on  the 
observers’  voice  levels  during  the  present  study  conditions  could  not  be  demonstrated. 
However,  perceived  threat  was  clearly  reflected  in  changes  in  voice  quality  and  speech  rate. 
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Future  studies  may  determine  whether  any  quantitative  measures  based  on  voice  quality  and 
speech  rate  may  be  developed  to  assess  a  soldier’s  level  of  arousal. 


Table  4 


Ratings  by  10  Judges  of  Perceived  Threat  for  30  Scenes 


Scene 

No. 

sn 

ab 

tg 

Judged  ratings 
bv  jp  lw  eh 

jt> 

kn 

gk 

Mean 

SDa 

10 

Mode  (N= ) 

9  8  7 

6 

3017 

2 

2 

3 

4 

3 

2 

3 

3 

2 

3 

2.7 

0.675 

4031 

1 

1 

2 

2 

3 

1 

2 

3 

3 

2 

2.0 

0.816 

9040 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1.1 

0.316 

1 

2106 

5 

5 

5 

5 

5 

5 

3 

5 

5 

5 

4.8 

0.632 

5 

4301 

5 

5 

5 

5 

3 

5 

5 

5 

5 

5 

4.8 

0.632 

5 

1452 

1 

1 

2 

2 

2 

2 

2 

2 

1 

3 

1.8 

0.632 

2 

7485 

5 

2 

4 

2 

2 

4 

4 

5 

2 

4 

3.4 

1.265 

0159 

5 

5 

5 

5 

4 

5 

5 

5 

5 

5 

4.9 

0.316 

5 

6192 

2 

1 

2 

2 

3 

3 

2 

2 

3 

2 

2.2 

0.632 

2 

5263 

2 

2 

3 

3 

3 

3 

3 

4 

4 

3 

3.0 

0.667 

3 

4317 

2 

1 

3 

2 

1 

1 

1 

3 

3 

2 

1.9 

0.876 

1340 

5 

5 

5 

5 

5 

5 

5 

5 

5 

5 

5.0 

0.000 

5 

6074 

4 

4 

3 

3 

4 

4 

4 

5 

3 

4 

3.8 

0.632 

4 

3471 

1 

1 

1 

1 

1 

1 

2 

5 

1 

1 

1.5 

1.269 

1 

7126 

2 

2 

2 

2 

1 

2 

1 

3 

2 

2 

1.9 

0.568 

2 

1529 

5 

3 

4 

3 

4 

4 

3 

5 

4 

3 

3.8 

0.789 

1275 

5 

5 

5 

5 

5 

5 

5 

5 

5 

5 

5.0 

0.000 

5 

5360 

2 

3 

2 

2 

2 

3 

2 

3 

2 

2 

2.3 

0.483 

2 

2058 

1 

2 

1 

1 

1 

2 

1 

1 

1 

1 

1.2 

0.422 

1 

6091 

3 

2 

3 

2 

2 

3 

3 

4 

2 

2 

2.6 

0.699 

5174 

5 

5 

5 

5 

5 

5 

5 

5 

5 

5 

5.0 

0.000 

5 

9180 

2 

3 

2 

2 

2 

3 

2 

2 

3 

2 

2.3 

0.483 

2 

3286 

2 

4 

3 

3 

3 

3 

2 

4 

3 

3 

3.0 

0.667 

3 

1467 

2 

3 

3 

4 

3 

3 

2 

4 

2 

3 

2.9 

0.738 

8371 

5 

5 

3 

4 

4 

4 

5 

5 

5 

4 

4.4 

0.699 

4021 

5 

4 

5 

5 

4 

3 

5 

5 

4 

3 

4.3 

0.823 

5220 

5 

5 

5 

2 

4 

5 

5 

5 

5 

5 

4.6 

0.966 

5 

3117 

5 

2 

5 

5 

5 

5 

5 

5 

2 

5 

4.4 

1.265 

5 

6329 

1 

2 

2 

1 

1 

2 

1 

3 

1 

2 

1.6 

0.699 

2534 

4 

1 

3 

5 

5 

3 

3 

5 

3 

3 

3.5 

1.269 

aSD  =  standard  deviation 


In  addition,  the  data  obtained  for  the  high  and  low  threat  scenes  were  consistent  with 
the  overall  data  reported  earlier  in  that  there  were  no  differences  between  viewing  conditions. 
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The  most  reasonable  conclusion  is  that  the  phenomenon,  if  it  exists,  of  speaking  louder  when 
NVGs  are  worn  could  not  be  reproduced  in  the  laboratory  experiment,  which  suggests  that  it 
is  not  related  to  the  physical  characteristics  of  the  viewing  device  alone. 


Table  5 


Rms  Voice  Level  (dBA)  for  Scene  Content  on  High  Threat  (T5) 
and  Low  Threat  (T2)  Scenes  Only 


Subjects  70NVG 

T5  T2 

Viewing  conditions  and  threat  level 

40  NVG  MOCK  HARN 

T5  T2  T5  T2  T5  T2 

NONE 

T5  T2 

1 

83.6 

82.1 

81.6 

80.7 

80.6 

79.9 

76.4 

77.5 

80.2 

78.6 

2 

81.9 

81.0 

84.2 

82.4 

86.0 

84.3 

86.5 

84.8 

83.8 

81.4 

3 

82.1 

83.4 

81.6 

81.0 

84.6 

84.0 

84.1 

82.0 

78.9 

78.7 

4 

80.0 

77.2 

81.2 

81.8 

81.0 

78.6 

82.2 

80.2 

79.0 

77.7 

5 

85.3 

84.7 

82.1 

81.2 

83.0 

81.8 

83.3 

83.1 

81.8 

80.3 

6 

71.0 

70.6 

69.6 

70.1 

71.7 

70.8 

73.1 

73.3 

73.7 

73.5 

7 

66.8 

64.2 

69.4 

70.0 

71.7 

70.6 

71.0 

70.0 

71.5 

71.5 

8 

79.8 

75.6 

83.8 

79.5 

80.3 

78.3 

81.7 

79.5 

76.1 

72.1 

9 

76.5 

74.7 

73.7 

74.4 

79.0 

77.6 

81.0 

80.0 

77.2 

76.4 

10 

80.4 

81.7 

77.6 

75.0 

77.8 

77.0 

81.7 

80.9 

79.0 

80.1 

11 

78.6 

79.7 

81.1 

79.6 

81.1 

81.9 

82.4 

82.1 

81.1 

81.0 

12 

78.8 

76.6 

80.1 

78.6 

78.8 

78.5 

78.5 

77.9 

78.7 

77.2 

13 

62.6 

64.6 

64.8 

64.2 

62.5 

61.1 

63.5 

64.0 

66.5 

67.8 

14 

59.2 

58.9 

58.5 

58.5 

60.7 

61.1 

63.3 

64.3 

61.0 

60.7 

15 

83.1 

82.3 

83.6 

84.2 

81.4 

80.6 

81.2 

81.0 

85.3 

84.8 

16 

80.0 

77.8 

80.0 

78.0 

77.3 

76.5 

78.0 

78.1 

78.5 

78.5 

17 

75.7 

74.2 

77.9 

78.0 

76.7 

77.8 

76.6 

76.5 

75.6 

74.0 

18 

83.4 

85.2 

86.9 

83.7 

84.7 

84.7 

84.4 

83.5 

83.4 

85.0 

19 

85.4 

77.6 

85.9 

81.1 

86.2 

84.0 

83.8 

78.2 

82.2 

80.3 

20 

54.5 

52.9 

50.0 

51.3 

54.8 

52.9 

55.3 

55.0 

52.5 

52.7 

SUM 

1528.7 

1505.0 

i  1533.6  1513.3  1539.9  1522.0  1548.0  1531.9 

>  1526.0 

1512.3 

MEAN  76.4 

75.3 

76.7 

75.7 

77.0 

76.1 

77.4 

76.6 

76.3 

75.6 

SD 

8.9 

8.9 

9.7 

8.8 

8.7 

8.7 

8.3 

7.7 

8.1 

7.9 

15 


Table  6 


Rms  Voice  Level  (dBA)  for  Identification  Numbers  on 
High  Threat  (T5)  and  Low  Threat  (T2)  Scenes  Only 


Subjects  70NVG 

T5  T2 

Viewing  conditions  and  threat  level 

40  NVG  MOCK  HARN 

T5  T2  T5  T2  T5  T2 

NONE 

T5  T2 

1 

81.7 

81.7 

80.4 

77.6 

76.0 

77.8 

74.1 

75.7 

76.6 

78.7 

2 

78.1 

80.8 

80.4 

79.6 

84.1 

80.7 

82.7 

78.5 

78.6 

77.9 

3 

85.3 

82.3 

82.8 

81.2 

84.6 

85.9 

83.5 

80.8 

83.4 

82.1 

4 

78.4 

79.0 

83.3 

81.8 

81.6 

76.5 

81.2 

83.3 

79.8 

80.8 

5 

84.6 

85.1 

80.7 

81.1 

82.4 

81.2 

84.3 

81.6 

80.7 

81.2 

6 

71.7 

71.6 

72.8 

71.2 

74.1 

72.6 

74.4 

75.2 

75.2 

73.9 

7 

69.6 

68.4 

72.9 

72.1 

77.4 

73.3 

71.7 

71.2 

73.0 

73.9 

8 

81.8 

71.8 

76.2 

77.3 

74.2 

75.8 

81.2 

77.1 

72.3 

70.2 

9 

80.2 

76.6 

80.3 

79.5 

82.0 

82.2 

81.6 

82.4 

79.3 

78.4 

10 

78.2 

79.7 

77.7 

77.6 

76.8 

77.1 

83.4 

80.6 

83.2 

79.4 

11 

78.4 

78.8 

79.9 

76.2 

81.0 

78.2 

78.0 

81.6 

81.8 

79.4 

12 

76.1 

79.5 

77.7 

78.6 

79.2 

75.9 

80.2 

72.5 

75.3 

76.8 

13 

62.0 

61.3 

61.8 

62.5 

61.1 

61.5 

63.1 

59.1 

63.8 

64.0 

14 

60.2 

60.6 

60.9 

59.1 

61.1 

62.2 

64.4 

62.8 

60.9 

63.7 

15 

82.1 

80.4 

84.4 

85.0 

81.1 

80.9 

81.8 

80.3 

85.2 

82.4 

16 

79.9 

80.4 

80.6 

78.1 

79.1 

76.5 

78.7 

79.0 

79.3 

79.2 

17 

75.0 

74.7 

76.5 

74.8 

75.7 

77.1 

75.8 

72.8 

73.6 

72.7 

18 

86.6 

84.6 

82.1 

83.3 

85.0 

84.3 

86.1 

83.1 

86.0 

84.9 

19 

80.6 

79.8 

78.8 

80.3 

81.3 

80.6 

76.7 

75.4 

77.1 

79.0 

20 

62.2 

59.3 

59.9 

60.1 

61.1 

62.8 

63.3 

62.4 

62.4 

59.2 

SUM 

1532.7 

r  1516.4  1530.1 

1517.0  1538.5 

»  1523.1 

1546.2  1515.4 

1527.5 

1517.8 

MEAN  76.6 

75.8 

76.5 

75.9 

77.0 

76.2 

77.3 

75.8 

76.4 

75.9 

SD 

7.7 

7.9 

7.4 

7.4 

7.6 

6.9 

7.0 

7.2 

7.2 

6.9 

16 


Table  7 


Rms  Peak  Voice  Levels  (dBA)  in  High  Threat  (T5)  and  Low  Threat  (T2)  Scenes  Only 


Viewing  conditions  and  threat  level 


Subjects 

70  NVG 

T5  T2 

40  NVG 

T5  T2 

MOCK 

T5  T2 

HARN 

T5  T2 

NONE 

T5  T2 

1 

84.6 

85.1 

80.9 

82.8 

81.8 

83.9 

77.0 

76.8 

81.7 

79.4 

2 

86.8 

82.8 

85.2 

84.3 

87.7 

83.5 

85.2 

78.1 

85.7 

81.8 

3 

85.0 

82.9 

82.7 

83.4 

86.1 

88.8 

86.9 

81.8 

81.0 

81.2 

4 

81.3 

78.8 

81.3 

83.5 

85.2 

79.8 

84.0 

82.1 

82.1 

79.8 

5 

89.6 

86.1 

85.1 

83.6 

85.1 

81.0 

86.0 

83.8 

82.5 

83.0 

6 

70.5 

72.1 

70.5 

73.9 

71.2 

71.5 

73.8 

72.4 

74.7 

76.1 

7 

66.2 

64.9 

68.4 

71.3 

73.1 

72.8 

71.1 

70.5 

73.3 

74.8 

8 

84.4 

77.3 

88.9 

81.3 

81.8 

79.3 

84.4 

80.7 

79.1 

75.6 

9 

78.2 

75.8 

74.4 

70.9 

80.2 

79.8 

82.0 

83.5 

76.6 

76.7 

10 

83.4 

83.1 

81.1 

78.1 

79.9 

77.8 

82.1 

83.5 

81.3 

81.0 

11 

81.7 

81.2 

83.1 

79.0 

82.8 

84.0 

84.9 

82.9 

82.8 

83.8 

12 

78.0 

78.5 

84.1 

81.9 

78.8 

79.0 

82.7 

78.1 

75.9 

79.5 

13 

65.5 

67.3 

69.3 

67.6 

64.7 

63.9 

66.8 

65.3 

67.1 

69.4 

14 

61.0 

59.4 

58.8 

60.4 

61.8 

65.0 

65.1 

64.9 

61.7 

62.1 

15 

83.4 

83.9 

81.8 

85.3 

80.7 

79.7 

81.0 

80.4 

84.8 

84.3 

16 

78.0 

79.4 

81.0 

79.8 

77.6 

76.6 

75.9 

78.1 

80.4 

79.1 

17 

76.1 

74.9 

77.9 

79.6 

77.6 

77.7 

77.6 

76.8 

74.5 

75.2 

18 

83.5 

85.7 

86.9 

83.8 

82.3 

83.8 

82.1 

84.4 

81.5 

83.4 

19 

87.3 

76.6 

90.9 

82.1 

90.7 

84.4 

85.3 

79.2 

82.5 

79.6 

20 

56.6 

52.9 

52.8 

53.2 

55.0 

57.0 

57.5 

56.2 

53.3 

54.9 

SUM 

1561.1 

1528.7 

1565.1 

1545.8 

1564.1 

1549.3 

1571.4 

1539.5 

1542.5 

1540.7 

MEAN 

78.1 

76.4 

78.3 

77.3 

78.2 

77.5 

78.6 

77.0 

77.1 

77.0 

SD 

9.3 

9.1 

9.9 

8.7 

9.0 

7.9 

8.0 

7.5 

8.2 

7.4 

GENERAL  DISCUSSION 

The  primary  objective  of  this  experiment  was  to  determine  if  the  phenomenon  of 
increased  voice  level  associated  with  NVG  use  could  be  reproduced  in  the  laboratory  during 
more  controlled  conditions  than  would  be  available  in  the  field.  If  so,  it  would  provide 
information  about  whether  the  phenomenon  had  any  basis  in  the  purely  physical  aspects  of 
using  NVG  displays.  The  authors  were  not  able  to  capture  the  phenomenon  in  the 
laboratory.  There  are  certain  critical  factors,  however,  that  exist  in  field  situations  and  create 
the  context  of  the  original  reports  of  the  phenomenon.  First,  in  the  laboratory,  the  observer 
cannot  move  through  the  terrain.  The  active  physical  effort  exerted  while  performing  such  a 
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task  could  influence  voice  level.  Second,  the  space  surrounding  the  observer  in  the  laboratory 
cannot  be  totally  darkened,  as  it  would  be  outdoors  at  night,  because  of  the  reflected  light 
spill  of  the  video  projection  method  used  to  present  the  stimuli.  For  an  effect  of  attentional 
tunneling  to  take  place,  sufficient  perceptual  isolation  of  the  central  task  of  examining  a  bright 
display  in  the  darkness  of  the  surrounding  environment  may  be  necessary.  Third,  the  stress 
of  a  real  viewing  situation  cannot  be  duplicated  in  the  laboratory.  The  NVG  user  on  a  stealth 
night  mission,  playing  the  role  of  “the  eyes  of  the  squad,”  is  in  a  highly  stressful  situation  by 
definition.  Just  walking  through  the  woods  at  night  while  wearing  the  NVGs,  with  a  limited 
FOV,  trying  not  to  trip  on  ground  obstacles  or  to  be  hit  in  the  face  by  branches  of  trees  is  not 
a  casual  task.  Stress  is  also  increased  by  the  possible  detection  by  an  enemy  and  the 
responsibility  of  the  NVG  user  for  the  survival  and  safety  of  the  other  squad  members. 
Perhaps  stress  itself  may  indirectly  influence  voice  level  through  an  effect  on  general  muscle 
tension,  thereby  making  voice  level  control  more  difficult.  Also  consider  that  the  rest  of  the 
squad  normally  do  not  wear  NV Gs  and  are  following  the  lead  observer.  They  have  to  play 
the  role  of  “the  ears  of  the  squad”  and  pay  extra  attention  to  the  sound  of  the  surroundings. 
That  factor  and  the  fear  of  discovery  by  the  enemy  may  create  a  situation  in  which  even  the 
softest  verbal  utterance  by  the  NVG  user  might  sound  too  loud  to  the  listener.  This 
constitutes  another  possible  facet  of  the  phenomenon.  The  NVG  wearer  may  or  may  not 
speak  louder  than  normal  in  a  stealth  context,  but  the  same  context  may  also  contribute  to  the 
phenomenon  by  causing  the  speaker  to  sound  louder  than  normal  to  the  listener. 

A  secondary  objective  of  constructing  a  dynamic,  visual  target  acquisition  task 
involving  targets  at  close  range  did  prove  successful  in  immersing  observers  who  wore  NVGs 
in  the  scene  context.  Observer  behavior  such  as  responding  to  a  perceived  threat  can  be  more 
insightful  of  the  kinds  of  scenarios  that  soldiers  might  face  in  the  field.  Visual  target 
acquisition  tasks  typically  used  in  a  laboratory  are  static,  single-frame  search  tasks  that  may 
not  sufficiently  involve  the  observer.  Even  when  natural  scenes  are  used,  the  observer’s  task 
is  usually  to  search  for  and  to  find  a  stationary  target  (typically  a  distant  vehicle)  and  then  to 
be  presented  with  a  new  scene.  In  a  more  realistic  target  acquisition  task  in  which  the  natural 
scene  is  continually  changing,  there  is  a  sense  of  time  passing,  and  the  dynamics  of  continual 
scene  change  and  die  uncertainty  of  a  target  appearance  may  be  more  realistic  and  more 
formidable  factors  which  affect  human  expectancy  in  target  acquisition  performance.  Such 
dynamic  elements  should  be  used  in  future  scene  development  for  target  acquisition  studies. 
The  authors  also  learned  that  when  “targets”  (e.g.,  other  soldiers)  are  near  enough  to  the 
observer  to  elicit  an  immediate  response,  not  only  the  presence  of  the  target  but  also  the 
nature  of  its  activity  and  context  should  be  considered  as  factors  in  more  realistic 
presentations. 

Further  investigation  of  the  phenomenon  of  louder  speech  level  associated  with  the 
wearer  of  NVGs  is  planned  for  the  near  future.  The  authors  hope  that  the  source  of  the 
phenomenon  can  be  found  outdoors  in  a  realistic  field  scenario  involving  the  uncertainty  of  a 
natural  environment,  the  isolation  of  nighttime  darkness,  and  the  stress  of  being  discovered. 
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INSTRUCTIONS 


(To  be  read  to  the  observer,  with  some  notes  on  Procedures). 

We’re  interested  in  seeing  how  soldiers  find  targets. 

In  this  experiment,  we  want  to  see  how  soldiers  find  targets  wearing  night  vision  goggles,  and 
we’ll  do  this  in  this  room. 

I’ll  show  you  some  short  scenes  of  wooded  areas,  woods.  You  need  to  imagine  that  you’re 
actually  in  the  scenes,  in  the  woods,  watching  what’s  going  on. 

There  may  be  one  or  more  soldiers,  or  no  soldiers,  in  the  woods  and,  the  soldiers  may  be 
motionless  or  moving. 

Imagine  YOU  are  the  lead  observer  in  a  Special  Forces  Team.  You’re  on  a  night  mission  with 
other  squad  members  following  behind  you. 

It’s  YOUR  responsibility  to  observe  any  military  activity  you  might  see  and  to  describe  it  to 
the  next  squad  member  behind  you. 

You’re  the  only  soldier  in  the  squad  wearing  night  vision  goggles  so  you’re  the  eyes  of  the 
squad. 

You  must  not  only  notice  as  much  as  possible  of  what  might  be  going  on  out  there,  but  must 
also  communicate  this  information  to  another  person  who  can’t  see  what  you  see. 

You  need  to  report  what  you  see,  quickly,  AS  YOU  SEE  IT,  not  after  you’ve  seen  it. 

MOST  important  is  how  ACCURATE  you  are  in  describing  what  you  see.  It  could  affect 
everybody’s  survival. 

SOMETIMES,  it  might  look  like  a  soldier  you  see  in  the  scene  is  doing  something  with  a 
weapon.  We’d  like  you  to  report  what  direction  you  think  the  soldier  is  aiming  at,  to  your 
left,  or  right,  or  maybe  at  you  and,  how  far  he  is  from  you. 

SOMETIMES,  the  soldier  may  be  moving  left  or  right  or  toward  you.  You  need  to  report 
which  way  he’s  moving  and  at  what  distance. 

It  might  SOMETIMES  look  like  YOU  were  noticed  by  the  soldier  you  see.  If  you  think  that 
he  saw  you,  you  need  to  report  that  also. 

SOMETIMES,  a  soldier  or  soldiers  may  be  just  standing  there.  You  need  to  report  where 
they  are  and  at  what  distance. 

AND,  SOMETIMES,  it  might  look  like  there’s  nothing  at  all  going  on.  You  just  report 
there’s  no  military  activity  or  it’s  all  clear,  and  it’s  safe  to  continue. 

Any  questions  so  far,  about  what  we’re  asking  you  to  do? 

For  this  experiment,  you  wear  three  different  goggles  or  no  goggles. 
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You  stand  on  this  platform  to  watch  the  scenes. 

You  see  6  scenes  for  each  set  of  goggles  you  wear  or  while  not  wearing  any  goggles. 

Some  of  the  goggles  are  night  vision  goggles. 

We’ll  be  testing  5  conditions,  with  and  without  goggles.  I’ll  tell  you  more  later. 

Remember,  we  want  to  compare  how  it  is  to  find  a  target  wearing  goggles  with  how  it  is  to 
find  a  target  without  wearing  goggles. 

Any  questions? 

Here’s  the  procedure: 

FIRST,  you  see  a  4-digit  number  in  front  of  you.  It’s  large. 

You  can’t  miss  it. 

You  tell  me  those  digits. 

The  digits  are  visible  for  about  two  or  three  seconds. 

Tell  me  them  one  digit  at  a  time,  like  4-1 -8-3. 

Don’t  wait  until  they’re  gone.  Tell  me  while  they’re  there. 

I’ll  be  standing  near  the  wall  behind  you,  in  back  of  this  partition,  writing  the  number  you 
say,  so  that  we’re  certain  which  scene  you’re  looking  at. 

THEN,  about  two  seconds  after  the  numbers  disappear,  the  scene  begins. 

YOUR  job  is  to  imagine  YOU  are  actually  in  those  woods,  observing  anything  of  military 
importance  and  describing  what  you  see,  AS  YOU  SEE  IT,  to  another  squad  member,  me, 
standing  behind  you. 

If  you  see  a  soldier  in  the  scene: 

FIRST;  tell  me  his  LOCATION:  “soldier,  9  o’clock  (or  3  o’clock,  or  12  o’clock,  wherever  the 
soldier  is...  (EXPLAIN  TO  S) 

THEN;  tell  me  his  DISTANCE  from  you:  in  meters  or  feet  or  yards,  whatever  unit  you  like, 
but  use  the  same  unit  each  time.  Make  your  best  guess  of  the  distance. 

THEN;  in  any  order,  tell  me  his  ACTIVITY ; 
which  way  he’s  moving,  if  he  is,  OR  if  he’s  not  moving, 
which  way  you  think  a  weapon  is  pointing,  if  you  see  one, 
and,  judging  from  his  behavior,  if  you  think  HE  sees  YOU. 

AFTER  about  20  seconds,  the  scene  disappears. 

That’s  the  end  of  it.  I’ll  be  writing  down  the  information  you  gave  me.  I  won’t  be  talking  to 
you  during  the  tests  while  you’re  seeing  the  scenes. 
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The  next  scene  begins  in  5  or  10  seconds. 

You  see  a  new  four-digit  number,  which  you  tell  me. 

After  2  more  seconds,  the  new  scene  begins,  you  tell  me  what  you  see,  the  scene  ends,  and  so 
on.  In  reality,  the  scene  would  be  continuous  and  you’d  continuously  report  what  you’re 
seeing  if  it’s  important. 

After  you  see  6  scenes,  we  change  conditions  and  look  at  another  set  of  6  scenes.  When 
you’ve  looked  through  all  the  goggles,  and  looked  at  some  scenes  without  goggles,  you’ll  be 
finished  with  the  experiment. 

This  whole  procedure  should  take  about  20  minutes. 

Any  questions? 


Before  we  start  the  experiment,  I’ll  show  you  a  few  scenes  so  you  can  get  an  idea  of  the  task 
you’re  being  asked  to  do. 

(MIKE  ATTACH:) 

This  is  for  later  on,  when  we  want  to  analyze  what  kind  of  language  people  use  in  their 
descriptions  of  what  they  see  and,  in  case  I  miss  writing  down  some  important  notes  on  your 
descriptions. 

(ON  PLATFORM:) 

Remember,  I  can’t  see  any  part  of  the  scene  that  you  see. 

Describe  to  me  everything  that  you  think  is  important. 

TO  REPEAT,  four  digits.  Tell  me  what  they  are,  one  at  a  time. 

THEN,  the  scene.  Start  telling  me  what  you  see. 

LOCATION:  If  there’s  a  soldier,  moving  or  not,  tell  me  where  he  is,  from  9  o’clock  to  3 
o’clock. 

DISTANCE:  Tell  me  how  far  away  he  is. 

ACTIVITY:  Tell  me  what  he’s  doing.  If  he  has  a  weapon  which  way  is  he  pointing  it? 

If  he’s  moving,  what  direction.  Is  he  not  moving? 

Did  he  see  you? 

O.K.?  Here’s  the  first  scene.  After  it  we  stop  for  questions. 

(SHOW  SAMPLE  1:) 

Any  questions?  That  was  pretty  good. 

Remember,  If  you  see  a  soldier  in  the  woods,  moving  or  not,  report 
Location,  Distance,  and  Activity. 
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Let’s  try  another  scene.  Watch  for  the  numbers. 

Here  it  is. 

(SHOW  SAMPLE  2:) 

Location,  Distance,  Activity. 

O.K.  Good.  Any  questions? 

During  the  actual  experiment,  I  won’t  be  talking  to  you. 

Let’s  try  one  more  scene.  This  time  I  won’t  prompt  you.  See  how  much  you  can  report 
accurately  to  me,  without  my  saying  anything. 

Ready?  Here’s  the  scene. 

(SHOW  SAMPLE  3:) 

Any  questions? 

We’re  now  ready  to  begin  the  actual  experiment. 
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